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Abstract 

The  main  purpose  in  this  paper  is  to  lay  a  theoretical  foun¬ 
dation  for  the  design  of  mesh-connected  processor  arrays 
for  the  transitive  closure  problem.  Using  a  simple  path- 
algebraic  formulation  of  the  problem  and  observing  its  simi¬ 
larity  to  certain  well-known  smoothing  problems  that  occur 
in  digital  signal  processing,  we  show  now  to  draw  upon 
existing  techniques  from  the  signal  processing  literature  to 
derive  regular  iterative  algorithms  for  determining  the  tran¬ 
sitive  closure  of  the  graph.  The  regular  iterative  algorithms 
that  are  derived  using  these  considerations,  are  then 
analyzed  and  synthesized  on  mesh-connected  processor 
arrays.  Among  the  vast  number  of  mesh-connected  proces¬ 
sor  arrays  that  can  be  designed  using  this  unified  approach, 
the  systolic  arrays  reported  in  the  literature  for  this  problem 
are  snown  to  be  special  cases. 

I. Introduction 

An  interesting  problem  that  often  arises  in  various 
graph  theoretic  applications  is  the  transitive  closure  prob¬ 
lem.  Here  the  objective  is  to  determine  whether  or  not  there 
exists  a  path  between  any  pair  of  nodes  in  a  directed  graph. 
This  problem  has  been  addressed  in  the  context  of  neural 
modeling,  routing,  decision  making,  circuit  simulation,  tran¬ 
sportation  problems  etc.  and  a  solution  to  this  problem  can 
be  easily  adapted  to  determine  the  shortest(longcst)  path 
between  two  nodes  in  a  weighted  directed  graph,  the 
number  of  paths  between  these  two  nodes  (if  the  graph  is 
acyclic),  for  enumerating  the  paths  themselves,  for  deter¬ 
mining  the  set  of  strongly  connected  components  in  the 
graph,  for  determining  the  minimum  spanning  trees  of  the 
graph,  for  determining  the  bridges  in  a  graph  and  so  on  [5]. 

In  the  usual  formulation  of  the  problem,  the  graph  is 
given  in  terms  of  its  adjacency  matrix  A,  where  m,  =■  l  if 
there  is  a  directed  arc  in  the  graph  from  node  i  to  node  j, 
and  0  otherwise.  Then,  the  transitive  closure  of  the  graph  is 
requested  in  the  form  of  the  matrix  T  where  t ,  =  1  if  there 
is  a  directed  path  in  the  graph  from  node  i  to  node  ;,  and  0 
otherwise. 

Systolic  Architectures  for  solving  this  problem  on  a 
mesh  connected  array  of  processors  have  been  derived  in 
Guibas,  Kung  and  Thompson  (1|  and  recently  in  Kting  and 
Lo(2|.  However,  using  the  recently  developed  methodology 
in  [3,4],  we  can  show  that  there  are  a  vast  number  of  sys¬ 
tolic  configuration-  for  solving  the  problem.  Indeed,  any 
systolic  array  solution  for  the  matrix  multiplication  problem 
can '  be  simrlv  adapted  to  solve  the  transitive  closure 
problem;  it  has  been  shown  in  (4)  that  there  are  27  different 
array  configurations  with  strictly  nearest  neighbour  inter¬ 
connections  for  matrix  multiplication.  Thus,  there  are  that 
many  array  configurations  for  solving  the  transitive  closure 
problem  as  well.  The  methodology  described  in  [3,4,17]  for 
deducing  these  array  configurations  consists  of  the  follow¬ 
ing  steps: 

i.  Determine  a  well-structured  iterative  algorithm  (what  is 
known  as  a  regular  iterative  algorithm  in  [3,4, 17))  for 
solving  the  problem. 
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ii.  Analyze  the  algorithm  and  obtain  various 
implementation-independent  properties  of  the  algo¬ 
rithm,  including  a  valid  computational  schedule,  and 

iii.  Map  the  algorithm  onto  a  mesh-connected  processor 
array  for  which  the  computed  schedule  can  be  applied. 

In  this  paper,  we  shall  mainly  concentrate  upon  the  first  of 
these  steps.  The  other  two  steps  are  iust  applications  of  the 
techniques  developed  in  [3,4,17)  and  we  shall  only  briefly 
discuss  them. 

Organization  of  the  Paper:  The  first  step  above  is  addressed 
in  Section  11.  Here  we  shall  show  how  to  derive  regular 
iterative  algorithms  for  solving  the  transitive  closure  prob¬ 
lem.  In  particular,  we  shall  obtain  a  mathematical  formula¬ 
tion  of  the  problem  so  as  to  expose  its  relation  to  a  certain 
well  known  problem  in  signal  processing.  Thereupon,  we 
show  how  to  use  some  established  techniques  in  that  field 
for  obtaining  regular  iterative  algorithms  for  solving  the 

rroblcm.  The  resulting  algorithms  comprise  those  given  in 
1]  and  [2]  as  special  cases.  Once  a  Regular  Iterative  Algo¬ 
rithm  is  found,  one  can  apply  the  techniques  outlined  in 
[3,4,17]  to  determine  implementations  for  it.  Some  exam¬ 
ples  arc  given  at  the  end  of  this  section,  though  the  actual 
details  can  be  found  in  [19]. 

In  the  interest  of  brevity,  all  the  theorems  and  procedures 
given  in  this  paper  are  stated  without  proof.  The  interested 
reader  is  referred  to  a  longer  version  of  this  paper[19),  for 
the  nitty-gritty  details. 

II.  Regular  Iterative  Algorithms  for  the  Transitive  Clo¬ 
sure  Problem 

The  design  of  regular  iterative  algorithms  for  the  tran¬ 
sitive  closure  problem  is  discussed  in  this  section.  An  itera¬ 
tive  algorithm  is  said  to  be  regular  if  it  can  be  expressed  as 
a  set  of  functional  relations  as  follows:  For  j  =  1  to  v  do: 

*i(l)  =  ■  ■  ■  Ml-Ov, J,  •  ) 

where  v  is  some  constant  integer  that  determines  the 
number  of  indexed  variables  used  in  the  algorithm  and  /  is 
an  j-dimcnsional  integer  vector  that  ranges  over  a 
prcspccificd  set  of  points  known  as  the  index  points.  The 
index  points  span  a  certain  5-dimcnsional  region  known  as 
the  Index  Space.  In  addition,  for  a  regular  iterative  algo¬ 
rithm,  the  vectors  P,.  arc  required  to  be  independent  of  the 
index  point  t  and  the  ‘extent’  of  the  Index  Space.  More  on 
regular  iterative  algorithms  and  their  properties  can  be 
found  in  [3, 17, IS]. 

In  the  usual  path-algebraic  formulation  of  the  transitive 
closure  problem,  one  determines  the  transitive  closure  of 
the  graph  by  solving  a  set  of  linear  equations[6-8]  over  a 
semi-ring.  Using  this  formulation,  manv  known  algorithms 
for  solving  linear  equations,  such  as  Gaussian  elimination, 
Gauss-Sicdcl  iteration  or  Gauss-Jordan  elimination  can  be 
simply  modified  to  solve  the  transitive  closure  problem. 
Ihc  resulting  algorithms  for  the  transitive  closure  problem 
arc  better  known  in  the  graph-theoretic  literature  as  Carre’s 
algorithm^],  Ford-Fulkcrson’s  algorithmj9|  and  Warshall’s 
algorithm]  10)  (or  Floyd's  algorithm  [11])  respectively. 
Furthermore,  these  algorithms  can  be  written  in  a  regular 


a 


iterative  format,  and  thus,  using  the  techniques  given  in 
[3 ,4 1 ,  one  can  easily  generate  a  multitude  of  mesh  connected 
processor  implementations  for  the  transitive  closure  prob¬ 
lem. 

Even  though  there  is  such  a  striking  relationship 
between  the  algorithms  given  in  (6-10|  and  certain  well- 
known  procedures  for  solving  linear  equations,  there  docs 
not  seem  to  be  an  obvious  correspondence  between  the 
transitive  closure  algorithm  of  Guibas,  Kung  and  Thomp¬ 
son!  1)  and  any  known  linear  algebraic  procedure.  In  this 
section,  we  shall  show  how  one  can  obtain  an  alternative 
formulation  of  the  transitive  closure  problem  that  bears  a 
marked  resemblance  with  certain  well  known  problems  in 
digital  signal  processing.  Thereupon,  using  certain  known 
signal  processing  techniques,  we  shall  derive  several  regular 
iterative  algorithms  for  solving  the  transitive  closure  prob¬ 
lem.  These  algorithms  have  no  direct  counterparts  in  linear 
algebra,  but  have  close  relationships  with  known  algorithms 
in  recursive  filtering.  Furthermore,  Guihas  ct.  al.'s  algo¬ 
rithm  is  a  special  case  of  the  algorithms  that  arc  derived  in 
this  section  using  this  approach.  Curiosly  enough,  even 
Warshall's  algorithm  can  be  derived  as  a  special  case  of 
these  algorithms. 

Not  only  can  the  transitive  closure  problem  be  formu¬ 
lated  as  a  set  of  linear  equations,  it  can  also  be  determined 
by  solving  a  set  of  Lyapunov  equations,  or  by  solving  a  set 
of  matrix  quadratic  equations.  For  the  purpose  of  this 
paper,  the  formulation  of  the  problem  as  a  set  of  quadratic 
equations  is  of  interest,  since  this  formulation  exposes  the 
similarity  of  the  transitive  closure  problem  with  certain 
image  smoothing  problems.  In  this  formulation,  the  element 
in  the  r’  row  and  the  j'"  column  of  the  transitive  closure 
matrix  of  the  graph,  T,  is  expressed  as 


*/  -  'L'w 


where  ‘.’  represents  the  logical  OR  operation  and  ‘V 
represents  the  logical  AND  operation.  The  above  expression 
is  merely  an  algebraic  statement  of  the  obvious  fact  that  a 
path  from  node  i  to  node  j  exists  if  and  only  if  there  exists 
either  an  edge  from  i  to  j  (i.c.  ■ ,  =  l)  or  if  there  exists  a 
path  from  i  to  k  and  k  to  j  for  some  k. 

The  recursive  computation  of  a  matrix  as  suggested  by 
eqn.(l)  is  similar  to  certain  computations  that  arise  in  the 
recursive  smoothing  of  digital  signals  [  1 2 J .  Thus,  it  is  only 
natural  to  ask  whether  the  techniques  used  in  signal  pro¬ 
cessing  for  solving  such  problems  can  be  adapted  for  the 
transitive  closure  problem.  The  main  objective  in  this  sec¬ 
tion  is  to  show  that  this  is  indeed  so. 


tivc  closure  problem  by  computing  the  entries  of  the  T 
matrix  using  cqn.  (1),  no  matter  how  we  order  the  compu¬ 
tation.  This  is  because  of  the  cyclic  definition  of  the  ele¬ 
ments  of  the  matrix  in  that,  e.g.,  t,  is  dependent  upon  say, 
t.,  but  t;  is  dependent  upon  t,  for  any  k  *  j  (See  Fig.l). 
This  ‘nonrccursiblc’  situation  often  arises  typically  in  the 
recursive  smoothing  of  (noisy)  digital  signals  wherein  it  is 
usually  handled  using  one  of  two  main  approaches: 

i.  Extended  Filtering:  For  the  transitive  closure  problem, 
this  technique  would  correspond  to  solving  a  simpler 
‘rccursiblc’  problem  for  an  extended  graph. 

ii.  Fonvards-Backwards  Filtering:  In  this  approach,  one 
would  attempt  to  recast  cqn.(l)  in  a  recursiblc  form 
such  that  the  problem  can  be  solved  by  (possibly 
repeated)  ‘forward’  and  ‘backward’  passes  through  the 
adjacency  matrix. 

We  shall  mainly  elaborate  upon  the  first  approach  in 
the  rest  of  this  section.  Thus,  we  shall  show  that  by  solving 
a  rccursiblc  problem  for  a  suitably  extended  graph,  we  can 
obtain  the  transitive  closure  of  the  given  graph.  In  addition, 
we  shall  also  characterize  a  family  of  extended  graphs  for 
which  this  property  holds.  For  a  graph  with  E  directed  arcs, 
this  family  has  more  than  2-'e  members.  For  this  family  of 
extensions,  we  shall  present  an  algorithm  that  can  be 
applied  on  any  member,  to  obtain  the  transitive  closure  of 
the  given  graph.  We  shall  show  that  this  algorithm,  when 
applied  to  one  particular  member  of  this  family,  can  be 
related  to  the  algorithm  due  to  Guibas,  Kung  and  Thomp¬ 
son  [1|.  We  shall  also  show  that  the  algorithm  when  spe¬ 
cialized  to  vet  another  member  of  this  family,  after  suitable 
‘folding’  of  the  computation,  is  exactly  the  algorithm  due  to 
WarshallllO]  which  was  recently  implemented  on  a  systolic 
array  in  [2].  It  is  interesting  to  note  that  algorithm  in  (10), 
though  it  can  be  derived  as  a  special  case  of  the  extended 
graph  approach,  really  solves  the  problem  by  performing 
recursive  computations  using  forward  and  backward  passes 
through  the  adjacency  matrix. 

Before,  we  proceed  to  postulate  the  extended  graph 
problem  per  re.  we  need  to  define  the  concept  of  a  t-paih 
(see  e.g  UllmanlS]). 

Definition:  t  path:  A  path  from  node  i  to  node  j  (or  node  j 
to  node  i)  is  said  to  be  a  /-path  if  every  node  n,  that  lies 
along  this  path,  excluding  the  end  points,  has  index  less 
than  /,  i.c.,  it  satisfies  n  <  t. 

I 

For  the  present,  we  shall  be  interested  in  the  case 
i  =  min(i.j).  This  is  because  a  mm(i,;)-path  has  the  important 
property  that  it  is  either  a  direct  edge  or  is  composed  of  a 
mm(i,*)-path  and  a  minfi^J-path  for  some  k  <  min(ij). 
Indeed,  choose  k  to  be  the  maximum  node  index  along  the 
minii .]  )-path  and  consider  the  sub-path  from  i  to  k.  By 
choice.  *  is  larger  than  any  node  index  in  this  sub-path 
(excluding  i)  and  consequently  this  is  a  mi/i(i,*)-path  from  i 
to  k.  Likewise,  the  k  to  i  path  is  also  a  mm(*,/)-path,  thereby 
verifying  our  statement.  Clearly,  the  converse  statement  is 
also  true.  That  is,  if  there  is  a  mm(i,t)-path  from  i  to  k  and  a 
mint k . i ;-path  from  k  to  ;  for  some  k  <  mini ij),  or  if  there  is  a 
directed  arc  from  i  to  i,  then  there  is  a  mm(i,7).path  from  i 
to  j.  Writing  this  up  algebraically,  we  have 
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Fig.l.  The  csclic  nature  of  the  definition  in  cqn.  (I).  The 

darkened  lines  indicate  the  entries  of  T  that  have  to  be  com-* 
puted  hctorc  the  typical  element  marked  -  can  be  comput¬ 
ed.  In  the  im.uic  processing  literature  1 1 3 - 1 4 1 ,  this  would  bo 
referred  to  as  the  output  maik  of  the  recursive  filter.  I 


A  close  examination  of  the  system  of  equations  defined 
by  cqn.  (1)  reveals  that  it  is  impossible  to  solve  the  transi- 


which  is  merely  a  mathematical  formulation  of  our  earlier 
statement.  1  he  above  equation  clearly  shows  that  the  ele¬ 
ment  c.  of  the  matrix  C  is  dependent  only  upon  elements 
that  are  in  the  same  column  above  it  and  in  the  same  row  to 
the  left  of  it.  (Note  its  similarity  with  the  so-called 
‘quarter-plane’  two-dimensional  recursive  filter  (13-14).) 
Thus,  the  computation  of  these  elements  can  be  performed 
iteratively  starting  from  the  top  left  hand  corner  of  the 
matrix  C  and  proceeding  downwards  and  outwards. 

The  existence  and  uniqueness  of  the  solution  to  the  set 
of  equations  :n  (2)  can  be  easily  established  as  shown  in 


[  19] .  The  problem  of  determining  the  existence  of  a 
mm(i.;)-path  from  node  i  to  node  j  for  every  (i,y)  is,  there¬ 
fore,  quite  easy  to  solve.  Moreover,  as  shown  below,  this 
problem  can  be  solved  by  means  of  a  regular  iterative  algo¬ 
rithm;  thus  the  techniques  developed  in  [3,4]  for  analyzing 
these  algorithms  and  for  implementing  them  on  mesh  con¬ 
nected  processor  arrays  can  be  applied.  Later  in  this  section, 
we  shall  show  how  to  extend  the  given  graph  using  addi¬ 
tional  .v  nodes  (and  some  arcs)  in  such  a  manner  that  by 
using  the  information  about  the  existence  of  mm(ij)-paths  in 
this  2.V-nodc  extended  graph,  one  can  infer  the  transitive 
closure  of  the  original  graph. 

a.  Determining  the  Min(ivj)-paths  of  a  Directed  Graph 
The  system  of  equations  for  determining  the  min(ij)-paths 
of  a  directed  graph,  given  in  (2)  above,  resembles  in  a 
modified,  simplified  form,  the  set  of  relations  that  is  usually 
associated  with  ‘quarter-plane'  recursive  filtering  in  the  sig¬ 
nal  processing  literature  [13-14).  Since  this  filtering  problem 
can  be  solved  by  means  of  a  regular  iterative  algorithm  [4], 
it  should  be  possible  to  determine  the  mm(i ,))-paths  of  the 
graph  using  an  algorithm  of  the  same  form.  Such  an  algo¬ 
rithm  is  given  below; 

Algorithm  1:  for  determining  the  min(i j)-paths  of  a 
directed  graph:  For  ij,k  =  1  to  .V  do: 

a(i,j,k)  if  j*k 

a(‘J  +  l.*0  -  c(iJ,k)  ■'  a(ij,k)‘b(ij,k)  il  j  —  k 


b(i,j,k)  if  i*k 

b(i+l.j,k)  m  c(,- j.jfc)  ,  a(i,j,kyb(i,j,k)  if  i“* 


c(i.j,* i-l)  =  c(i,j,k)  /  [a(i,j,k)’b(i.j,k)]  (3) 

with  the  initialization 

a(i,l.k)  =  0 
b(lj.k)  =  0 

cfij.l)  =  (4) 

for  ail  I  .j.k  =  1  io  .V.  Then  c(ij,.v+i)  -  l  if  there  is  a- 
mint i  ,y)-path  from  node  i  to  node  j  and  is  equal  to  zero  oth¬ 
erwise. 

Some  Variations:  It  is  possible  to  relate  the  above  algorithm 
to  the  ‘first  pass’  of  the  systolic  algorithm  in  [1].  However, 
in  (1|,  the  adjacency  matrix  of  the  graph  is  input  twice  to 
the  array,  once  from  the  left  end  of  the  array  and  once  from 
the  top  of  the  array.  In  the  above  algorithm,  the  adjacency 
matrix  is  input  only  once,  i.c.,  by  initializing  r(ij.l).  This 
would  indicate  that  there  may  be  other  possible  initializa¬ 
tions  for  the  algorithm  that  achieve  the  same  objective.  For 
example,  the  initialization  of  the  systolic  array  in  jl] 
corresponds  to  the  following: 

a(i.l.k)  = 

b{\.j.k) 

<•(■•/. 1)  -  0  (5) 

where  it  is  assumed  that  a  -  1.  We  can  show  that  for  this 
initialization,  or  if  in  can. (5)  m  y. I)  is  also  made  equal  to 
»  .  the  algorithm  can  be  used  to  obtain  more  information 
than  gist  the  "mti./j-pnths  in  the  graph.  In  the  following 
theorem,  a  weaker  version  of  which  appears  in  [2|,  the 
nature  of  the  paths  detected  bv  the  algorithm  tor  this  initial¬ 
ization  is  described . 

Theorem  I:  If  Algorithm  1  is  executed  with  the  initializa¬ 
tion  ot  cqn.  (5),  then  <-(i j..V »  I )  will  he  1  if  either 

i)  there  is  a  nm(i.;)-path  from  node  i  to  node  y,  or 


ii)  if  there  is  a  path  of  length  2  from  node  i  to  node  j,  or 

iii)  for  <  >  j,  if  there  is  a  decreasing  index  path  from  i  to 
some  k,  where  i  >  k  >  j,  and  a  mm(*j)-path  from  k  to  j, 
or 

iv)  for  j  >  i,  if  there  is  a  mmfi.ij-path  from  i  to  some  k, 
where  j  >  k  >  i,  and  an  increasing  index  path  from  k  to 
J- 

If  none  of  these  paths  exist  from  node  i  to  node  j,  then 
c(ij,.V+l)  will  be  zero. 

In  contrast  to  the  initialization  used  in  Algorithm  1,  the 
above  initialization  allows  us  to  obtain  more  information 
than  just  the  wm(i,y)-paths  in  the  graph.  Nevertheless,  this 
extra  information  is  never  used  in  the  algorithms  described 
in  the  rest  of  this  section.  Thus,  if  one  is  interested  only  in 
the  min(ij) -paths,  in  both  Algorithm  1  and  in  the  above  vari¬ 
ation,  one  could  consider  as  the  outputs  of 


the  algorithm.  Furthermore,  for  determining  minfijj-paths 
in  the  graph,  one  can  devise  a  whole  variety  of  possible  ini¬ 
tializations  of  Algorithm  1.  To  be  specific. 


Theorem  2:  If  the  inputs  to  Algorithm  1  are  chosen  such 
that 

a(i,l.y)c(i  J,l)  =  i,  for  i>j 
b(\j.i)  c(i.j.l)  -  »,  for  j>i 
c(M.l)  (a(i,l,i!'ft(l.i,i)]  -  1  (6) 

then  rti.j.nnh  ; )  -  ] )  will  be  1  if  there  is  a  mm(i.;).path  from 
node  i  to  node  j  and  will  be  zero  otherwise. 

Theorem  2  opens  up  several  interesting  possibilities. 
Thus,  for  example,  in  the  systolic  array  ofViuibas,  Kung 
and  1  hompsonj  1 ),  one  docs  not  necessarily  have  to  input 
the  adiacencv  matrix  twice,  as  described  bv  the  authors. 
Instead.  again  as  an  example,  the  lower  triangular  portion 
of  the  adiacencv  matrix  can  be  input  from  the  top  of  the 
arrav  and  the  upper  triangular  portion  from  the  left  of  the 
arrav  I  hese  and  other  variations  arc  discussed  in  detail  in 
Section  IV  of  this  [taper. 

b.  The  Fxtcndcd  Graph  Approach  to  the  Transitive  Clo¬ 
sure  Problem 

Knowing  how  to  determine  the  nmfi.;)-paths  in  a  graph 
using  a  regular  iterative  algorithm,  we  can  now  solve  the 
transitive  closure  problem  itself.  To  do  so,  we  shall  attempt 


to  extend  the  graph  in  such  a  manner  that,  solving  the 
nm(i.;)-path  problem  for  this  extended  graph  is  equivalent 
to  solving  the  transitive  closure  problem  for  the  original 
graph.  Even  though  this  idea  of  extending  the  domain  of 
the  problem  is  borrowed  from  the  signal  processing  litera¬ 
ture,  the  basis  for  this  approach  can  be  independently 
derived  using  the  following  simple  observation: 

Given  a  directed  graph  with  .v  nodes,  suppose  that  we 
introduce  additional  ,V  nodes  numbered  from  ( .v  —  1 )  through 
2.V  such  that  node  (,V  +  i)  has  one  incoming  arc  from  node  i 
and  one  outgoing  arc  to  node  i  and  no  other  arcs  incident 
upon  it.  For  this  modified  graph,  we  claim  that  if  there  is  a 
min(.V*i..V+y)-path  from  node  t.V-n)  to  node  (.V -*->),  then  and 
only  then  is  tncrc  a  path  in  the  original  graph  from  node  i 
to  node  ;.  This  statement  is  easily  verified  since  all  paths 
between  these  two  nodes  have  to  pass  through  node  i  and  I 
node  ],  and  since  every  path  in  the  original  graph  between 
node  I  and  node  j  has  to  be  part  of  a  mi/i(.v-t-i,.v+y)-path 
between  node  (.v  +  i)  and  node  (.v+j). 

The  iv-nodc  extended  graph  described  above  has  an 
adjacency  matrix  A,  given  by 

*  -  [l  i)  (7> 

where  I  is  the  (.vx.v)  identity  matrix. 

This  is  a  rather  simple  extension  of  the  original  graph 
and  was  derived  using  heuristic  arguments.  Instead,  one  can 
pose  the  the  extension  problem  formally  as  follows: 

The  Extension  Problem:  Given  an  .V-nodc  directed  graph 
with  adjacency  matrix  A,  determine  an  extension  of  this 
graph  with  adjacency  matrix 


with  appropriate  (.Vx,\)  matrices  A  :,  A;  and  A;;  such  that 
for  i.j  *=  ,V,  there  is  a  min(.V*i..V~;)-path  from  node  (.V-m)  to 
node  IS+j)  in  the  extended  graph  if  and  only  if  there  is  a 
directed  path  from  node  i  to  node  j  in  the  original  graph. 

Charactcri.  alion  of  Useful  Solutions  to  the  Extension 
Problem: 

An  useful  solution  to  the  extension  problem  is  one  that 
can  be  ‘easily’  inferred  from  the  original  graph.  Here,  we 
shall  assume  that  the  extension  can  be  easilv  inferred  if  it 
can  be  deduced  from  the  adjacency  matrix  of  the  graph  or 
from  the  nin(i.y)-paths  in  the  graph.  Such  a  family  of  solu¬ 
tions  to  the  extension  problem  can  be  characterized  as  fol¬ 
lows: 

Theorem  3:  A  graph  with  adjacency  matrix  given  as  in  cqn. 
(17)  is  a  solution  to  the  extension  problem  if 


A  . 

1  -  A; 

A.. 

1  -  A 

A  _ 

I  -  'V: 

(9) 

A  T  -  A 

T  -  A  T  -  T 

(10) 

where  the  hr  of  two  matrices  indicates  the  term  bv  term 
logical  (  .?  ('I  their  entries.  Further,  m  this  extended  graph, 
the  existence  ot  a  path  from  node  Y-u  to  node '  i  v  —  • ) 
impucs  the  existence  of  a  »ua(  v  -  i..v»;)-path  from  node 
(.v  -  ii  to  node  (  v  ♦  r  >. 

1  he  extended  graph  can  be  chosen  in  several  wavs  For 
instance,  the  choice  \  -  A  •  -  A  -  I  that  we  discussed  ear¬ 
lier,  satisfies  the  conditions  in  Theorem  3.  Likewise,  the 


choice  a;  -  A  ;  -  A;;  =  A  is  also  valid.  Alternately,  we 
could  first  determine  the  i-paths  in  the  original  graph 

and  use  this  information  for  initializing  the  extended  graph. 
Note  that  once  the  extended  graph  is  chosen,  Algorithm  1 
or  any  of  its  variations  can  be  applied  on  the  extended 
graph  so  as  to  obtain  the  transitive  closure  of  the  original 
graph,  even  though  some  of  the  variations  described  earlier 
determine  more  than  just  the  ™n(i.y)-paths  in  the  graph. 
This  is  because  of  the  fact  (stated  in  Theorem  3)  that  for 
every  path  in  the  extended  graph  from  node  (.v  —  < )  to  node 
f.v  +  j),  there  exists  a  corresponding  run-path.  Furthermore, 
since  we  arc  only  interested  in  the  mm(.v-i..v-y)-paths,  the 
transitive  closure  information  is  also  obtained  as 
e(fY  +  i„V  +  j,.V  +  min(i  j)*l)  =  t 

The  above  statements  were  made  without  taking  into 
consideration  the  structure  of  the  extended  graph.  Using  the 
structure  in  the  graph,  one  would  hope  that  determining  the 
existence  of  mm(.v-<.v-*-y)-paths  from  node  (,v*i)  to  node 
TV*;)  is  ‘easier’  in  some  sense.  Indeed  this  is  true  due  to  the 
following  result: 

Theorem  4:  If  Algorithm  1  is  executed  on  any  member  of 
the  family  of  extended  graphs  characterized  in  Theorem  3, 
then 

a)  For  i.j  ■  .V,  if  i  ■  j,  and  if  there  exists  a  m<u(/,y)-path 

from  node  i  to  node  j  in  the  original  graph,  then 

c(i,.V-y,i-e  1)  =  1. 

b)  For  i.j  it  .V,  if  j  >  i,  and  if  there  exists  a  mojt(i.y)-path 

from  node  i  to  node  j  in  the  original  graph,  then 

c(.V-<.;,y  +  l)  =  1. 

c)  Finally,  c(,V*i,.V*j„V+ 1)  =  t  ,  for  all  i.j  5  ,v. 

With  this  theoretical  background  established,  we  can  state 
the  basic  algorithm  for  determining  the  transitive  closure  of 
the  given  graph  using  the  extended  graph  approach.  Later, 
we  shall  particularize  the  algorithm  to  certain  special  cases 
(and  deduce  some  variations)  so  as  to  obtain,  c.g.,  the  algo¬ 
rithms  reported  in  [1.2], 

Algorithm  2:(for  determining  the  transitive  closure  of  a 
directed  graph): 

i.  Choose  an  extension  of  the  graph  using  Theorems  3,4. 

ii.  Execute  Algorithm  1  for  the  extended  graph  with  adja¬ 
cency  matrix  A  and  over  the  index  space  ij  =  1  to  IV 
and  *  =  1  to  min(ij,\). 

iii.  Determine  the  transitive  closure,  {t  },  of  the  original 
graph  as  {  c (,V  +  i,.V-i-y,.V-t- 1)}  for  all  ij  is  ,v. 

c.  Some  Variations  of  Algorithm  2  and  Some  Special 
Graph  Extensions 

Given  the  basic  algorithm  for  solving  the  transitive  clo¬ 
sure  problem,  w:  Algorithm  2,  one  can  easily  come  up  with 
variations  based  on  some  simpic  observations.  In  this 
manner,  we  shall  presently  show  how  to  deduce  the  algo¬ 
rithms  given  in  [1|  and  [2f  as  special  cases  of  the  extended 
graph  approach. 

Suppose  that  Algorithm  2  is  executed  and  the  values  of 
((i.i.™ii.)..\>I)  arc  collected  in  matrix  form  in  the  obvious 
order,  for  i ./  -  l  to  2.V.  Call  the  resulting  (1V\2,V)  matrix  C, 
and  partition  it  into  i.Vx.V)  blocks  as 


Then,  from  Theorem  4,  it  is  known  that 

C  -  T  (12) 

and 

C;.xC  :  -  T  (13) 


where  the  symbol  ‘x’  signifies  the  logical  product  of  the 
two  matrices.  (The  logical  product  is  the  same  as  the  usual 
matrix  product  except  that  scalar  addition  is  replaced  by  the 
logical  OR  operation  and  scalar  multiplication  by  the  logical 
A.vo  operation.)  Furthermore,  we  know  that 


T  =  C  T  =  T 


and  from  the  definition  of  the  transitive  closure  matrix, 

TxT  =  T 


Therefore,  one  can  show  that 


{c;C,}x{C.;-C,j  »  T 


Now,  in  Algorithm  2,  if  it  is  applied  to  the  extended  graph 
without  modification,  one  evaluates  C  ;  and  C-  separately 
and  then  finds  the  logical  product  of  the  two  to  determine 
the  transitive  closure  of  the  original  graph.  However, 
eqn.(16)  implies  that  we  do  not  need  to  determine  these 
matrices  separately.  The  logical  OR  of  these  matrices  is  suf¬ 
ficient  for  our  purpose.  This  observation  is,  in  spirit,  the 
basis  of  the  systolic  transitive  closure  algorithm  of  Guibas, 
Kung  and  Thompson[l],  though  these  authors  do  not  derive 
their  algorithm  using  such  considerations. 

Certain  choices  of  the  extended  graph  are  ‘natural’  and 
need  to  be  examined  in  some  detail.  Thus,  the  choice 
A j  =  A-  =  A--  -  I,  discussed  previously,  is  a  simple  and 
straight-forward  solution  to  the  extension  problem.  For 
such  a  choice,  the  matrix  C,  defined  above,  is  such  that  C  ; 
is  lower  triangular  with  ones  on  the  diagonal,  and  C;  is 
upper  triangular  with  ones  on  the  diagonal.  This  fact  can 
be  easily  verified  using  the  observation  that  c(M  +  ijj  + 1)  is  1 
if  there  is  a  ™«f.v-;,;)-path  from  node  f.v+i)  to  node  j  in  the 
extended  graph  and  is  0  otherwise.  Now,  for  the  above 
choice  of  the  extended  graph,  any  path  from  node  (.V+i)  to 
node  i  has  to  pass  through  node  ;  and  therefore,  if  i  >  j,  all 
these  paths  arc  not  mm(.V-‘-i,y)-paths.  Hence  c(N+ijj  +  l)  =  0 
for  i  >  j,  which  substantiates  our  statement  that  C;  is  upper 
triangular.  Similarly,  one  can  conclude  that  C_;  is  lower  tri¬ 
angular. 

For  this  particular  choice  of  the  extended  graph,  the 
matrix  C  ;  is  lower  triangular  while  the  matrix  C;  is  upper 
triangular.  However,  for  any  arbitrary  choice  of  the  exten¬ 
sion.  these  matrices  need  not  be  triangular.  Nevertheless, 
we  can  show  that  the  relevant  triangular  portions  of  the 
resulting  matrices  are  of  interest  because  of  the  following 
theorem. 

Theorem  S:  For  any  choice  of  the  extended  graph  satisfying 
Theorem  3,  let 


C;.  -  I;.  I  l';.  (17) 

where  the  I.'s  arc  lower  triangular  and  l's  arc  upper  tri¬ 
angular  matrices.  Define 

L  -  I.  I 


V  -  L  I  (18) 

Then,  the  transitive  closure  of  the  original  graph,  T  is  given 
by 

T  -  l  xL  (19) 

Notice  how  the  structure  of  the  extended  graph  has 
been  used  to  reduce  the  computational  effort  involved  in 
Algorithm  2.  I  bus.  in  Theorem  4,  wc  showed  that  instead 
of  considering  for  the  transitive  closure  of 


the  graph,  one  can  instead  use  c(i,j,min(i,j,X)  +  1).  This 
reduces  the  total  amount  of  computation  by  approximately 
.vV 3  logical  AM)  and  logical  OR  operations.  Next,  in 
Theorem  5,  we  showed  that  we  do  not  need  to  compute  the 
matrices  C  •  and  C:  entirely.  Only  the  lower  triangular  por¬ 
tion  of  Cj  and  the  upper  triangular  portion  of  C;.  are  of 
relevance.  This  further  reduces  the  total  computational 
effort  by  about  ,vV3  logical  operations.  Thus,  using  these 
theorems,  we  have  reduced  the  computational  effort  from 
8.V-V3  operations  to  2.VJ  operations.  Nevertheless,  for  a 
sequential  implementation,  this  is  still  about  a  factor  of  2 
greater  than  the  computational  effort  required  for 
Warshall’s  a!gorithm[10].  However,  Warshall's  algorithm 
can  be  derived  using  the  extended  graph  approach  by  con¬ 
sidering  the  graph  extension  for  which  A  ;  =  A:.  =  A;;  =  A. 
An  examination  of  the  operation  of  Algorithm  2  for  this 
extension  of  the  graph  reveals  that  some  of  the  intermediate 
variables  computed  by  the  algorithm  are  identical  and  there¬ 
fore  the  computation  of  these  variables  need  not  be 
repeated.  In  the  algorithm,  <-(A'  +  i..v+_/,,v  + 1)  is  obtained  by 
computing  iteratively  for  k  -  1  to  N , 

c(X  +  i,H-rj,k+\)  =  c(N-i.X  +  j,k)J 

(o(.V-*-  j  ,k)' b(X  +  iji+j.k)]  '  * 

Now,  the  variables  a(X+i.X *-) .k)  and  b(.V~i.N*-j,k)  in  eqn. 
(20)  are  really  dummy  variables  that  represent  c(X+i,k,k+\) 
and  c(k.S*j.k+ 1)  respectively.  However,  for  the  special 
choice  of  the  extended  graph  under  consideration,  we  can 
show  after  some  algebraic  manipulations,  that 
c(X  +  i.k.k+  1)  =  c(.V  +  i,N*k,k)  and  c(k.X+j,k+\)  =  r(.V  +  * + j ,k) 
and  therefore, 

c(N+i,N+j,k  +  1)  =  c(N  +  i,X  ±  j  ,k)‘ 

[c(.V  +  i  ,.V  c(.V  +  *  „V  -r;  ,* )  |  1  > 

This  is  precisely  the  transitive  closure  algorithm  due  to 
Warshall|  10]  (  a  shortest-path  version  of  the  algorithm  is 
due  to  Floy d(  11]  and  a  minor  modification  of  the  algorithm, 
in  its  shortest  path  version,  is  attributed  to  Dantzig[  15)), 
which  was  recently  implemented  in  a  systolic  array  by  Kung 
and  Lo[2). 

Clearly,  for  a  sequential  implementation,  the  above 
algorithm  requires  .V3  logical  OR  and  logical  AXD  operations 
which  implies  a  factor  of  two  reduction  in  computational 
effort.  Nevertheless,  the  amount  of  parallel  time  (from 
first  input  to  last  output)  required  for  the  (.Vx,V)  systolic 
array  in  [1]  and  the  (.vx.v)  systolic  array  in  [2|  is  the  same 
and  equal  to  ;.v.  This  is  because  Warshall’s  algorithm 
requires  forward  and  backward  processing  of  the  data  (both 
increasing  and  decreasing  ij  indices)  whereas  in  the  general 
case,  in  Algorithm  2,  the  data  is  always  propagated  in  the 
forward  (increasing  i.j  indices)  direction.  Thus,  in  order  to 
implement  Warshall's  algorithm  on  a  systolic  array,  the 
authors  in  [2]  had  to  introduce  a  feed-back  loop  which 
renders  the  iteration  interval,  i.c.,  the  time  between  succes¬ 
sive  data  sam,  les,  to  be  3,  whereas  the  iteration  interval  for 
the  array  in  (1]  is  1.  Thus,  the  total  time  taken  is  the  same 
in  both  cases,  even  though  the  systolic  array  in  [1]  requires 
three  passes  for  computing  the  transitive  closure,  whereas 
the  array  in  [2]  requires  only  one  pass. 


Implementation  of  the  Regular  Iterative  Algorithm 

Following  the  procedure  given  in  [3,17|,  let  us  define  a 
number  of  ‘parallel  iteration  steps'  for  the  algorithm  in  the 
following  lashion:  at  iteration  step  r,  variables  aUj.k), 
bU.j.k)  and  rti.j.k )  will  be  computed  if  +  k  -t.  The 
expression  ti*-;-i-)  is  the  so-cjllcd  linear  scheduling  function 
for  these  variables  (3|,  since  it  maps  each  variable  at  any 
index  point  into  a  unmuc  iteration  step.  Furthermore,  the 
computation  of  the  variables  at  iteration  step  t  requires,  as 
input,  the  values  of  variables  that  arc  computed  at  iteration 
step  i-  I,  since  e.g..  a(i.j.k)  is  dependent  upon  aU.j  - 1  ,k)  and 
so  on.  Therefore,  if  we  can  schedule  the  computation  of  the 


algorithm  such  that  all  variables  at  step  t-1  are  computed 
before  we  proceed  to  step  /,  then  all  precedence  constraints 
will  be  satisfied. 

Having  obtained  the  linear  scheduling  functions  for  the 
algorithm,  we  can  now  proceed  to  systematically  implement 
Algorithm  2  on  a  mcsh-conncctcd  processor  array  using  the 
techniques  described  in  [4,17|.  To  do  so,  note  that  the 
index-space  for  Algorithm  2,  i.e.  the  span  of  the  indices 
for  the  variables  in  the  algorithm,  can  be  described  by 
the  set  of  constraints 


lsisiv 

[  max(i,N)-N  +  1  ]sjs[  min(W,i)-*-.V  ] 

lsts|  min(iJ,N )  ]  (22) 

where  the  variation  of  Algorithm  2,  suggested  by  Theorem 
5  is  assumed.  Thus,  a  streamlined  version  of  Algorithm  2 
has  the  index  space  shown  in  Fig. 3.  Any  point  with  integer 
coordinates  (i i,j,k),  that  lies  within  this  space,  is  referred  to 
as  an  index  point.  The  index  point  (ij.k)  conceptually 
represents  the  computation  described  by  the  iteration  unit 
of  the  algorithm.  Thus,  at  this  index  point,  the  variables 
a(i.j.k),  b(i.j.k)  and  c(i,j,k )  are  made  available  to  the  algo¬ 
rithm  and  after  some  time,  the  variables  a(i.j-el.k),  b(i+ij,k ) 
and  c(i  j.i+l)  are  computed  and  sent  to  the  appropriate 
index  points. 


Fig. 3:  The  index  space  for  a  variation  of  Algorithm  2:  Thel 
three  dimensional  solid  shown  above  is  specified  by  the  set, 
of  constraints  in  cqn.(22). 

Just  as  an  example,  let  us  now  assume  that  all  variables 
with  the  same  t-index  are  executed  by  the  same  processor. 
In  terms  used  in  [3,4],  the  iteration-space  is  [0  0  1],  This 
defines  a  two-dimensional  processor  array  (shown  in  Fig. 4) 
with  approximately  3.V*  processors.  In  this  array,  the  a  and 
b  values  are  propagated  to  the  neighbouring  processors 
while  c(ij.k)  represents  a  register  value  that  is  updated 
within  processor  at  location  (ij). 

In  the  array  shown  in  Fig. 4,  the  processor  at  location 
(ij)  computes  all  variables  with  index  d.j.k)  at  ‘time’ 
0+j  +  k).  Using  this  fact  and  by  examining  the  set  of  con¬ 
straints  that  determine  the  index-space,  one  can  argue  that 
the  processors  at  locations  (ij),  (i  +Nj)  for  i  =  y  ts  n, 
for  ;sisi(,  and  at  location  (i+N.j+N)  are  active  during 
disjoint  intervals  of  time.  The  operations  of  all  these  proces¬ 
sors  can  therefore  be  simulated  on  a  single  one;  the  result¬ 
ing  array,  shown  in  Fig. 5,  is  precisely  the  array  derived  in 
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